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Y ESTIMATION IN PROGRAM SYNTHESIS , Ne 


This paper cescribes a system for using efficiency <nowledge in program synthesis. The system, cailed LIBRA, uses a 
comoination of knowieoge-oased ruies and aigenraic cost estimates {o comoare potential program imoiementations. 
Efficiency ‘knowledge :s used to contro! the selection of aigorithm ana gata structure impiementations ang the 
aopiication of optinzing transformations. Prototypes of programming constructs and of cost estimation techniaues 
are used to simplify the efficiency anaiysis process and !o assist in the acauisition of efficiency knowledge associated 
with new cocing knowledge. LiGRA nas ceen used to guide [he seiection of implementations for several programs 
that classify, retrieve information, sort, ana generate prime numbers. 


!. INTROOUCTION 


Cfficteney considerations often impose conflicting demands on a 
program synthesis system. On the one hand, a synthesis system 
must oroduce an efficient target ‘anguage program on the other, 
it must produce that target code in a reesonacie amount of lime 
and without running out of storage. This paper discusses « 
system that takes a middie ground between the extremes of |) 
constructing ail possible programs that meet the soecification and 
picking the most efficient, and 2) using cefauit impiementations. 
The system, catled LIGRA, uses a knowleage base of efficiency 
rules (0 guide the construction of refativety efficient target 
language programs in 2 reasonable amount of lime. LIBRA works 
from a more abstract specification and considers a wider range 
of target-ianguage imptementations than optimizing compilers. 
Many choices must be made, and making a good decision 
denends on a giobai view of the program. The target 
programs are not guaranteed to be optimei, but the 
efficiency knowledge is designed to ailow the flexibility of treding 
off target-program efficiency for speed and compeciness 
in (he synthesis process. 


The besic paredigm is heuristic search through a set of more and 


more complete program descriptions. Estimates of the 
execution costs of crogrem imoiementations are used as 
evaluation functions in {he search. Symootic, aigebdraic 


es anaivsis is used to estimate the execution costs. 
nowledge about the time and stcrage costs of data 
structures and Operations is used to choose combinations of 
algorithms ana cats reoresentations and to controt the 
acolication of optimizing transformations. Ruies about 
Diausible implementations are used to orune the seerch tree. 
LIBRA has ceen been used to guide the construction of 
several varients of programs thet retrieve information, sort, 
classify, and generate prime numbers, 


2. BACKGROUND 


LIBRA is an extension of an interactive orogram synthesis system 
that generates imoiemantations in 2 target ianguage by a series 
of transformations and refinements of program descriptions, 
called coding ruies, The knowledge base of coding rules wes 
developed by Sarstow {1}. The knowleage base allows pregrams 
in the area of symootic orocessing to be specified in terms of 
constructs inctuding sets, meppings, set operations, and 
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enumeration. Tne knowleage in both the coging rules and 
aificiency rules oermits the construction of programs using ists, 
arrays, asm taoies, oroperty iists, and several enumeration, 
sorting, and searching constructs. The target programs are 
written in a subset of INTERLISP. 


Most of the rutes are not specific 'o the target language. For 
examoie, there are 5 or i0 rules that graduaily refine a set into 
a hasn tabie, a then a few language specific rures ior refining 
the hash table into LISP. Although the general paracicm is 
refinement from abstract fo more cetaiied srogram cescriotions, 
transformations such as comoini nestea biocks of coce or 
nested loops are aiso allowed. LIGRA decices wrether or sot io 
3oply such a transformation just as it cecides wich ot severa 
refinements to aopiy, by looxing at the zoos execution cost 
estimates or by aoptying heuristics. 


LIBRA and the coding rules function together both as the 
synthesis phase of the PSI program synthesis system (2) ana as 
an incepencent synthesis system. Figure i shows a simoiitied 
view of the synthesis phase and its reiation to the rest of PSI. 
The other modules of the PSI system ailow the cescrrotion of 
programs by English diaiogue or by examoies or traces, and 
transiate the specification inta a compiele fmgn-ievei ianguage 
description. A specification in this high level ianguage can ais 
be given directly to the syntnesis phase. 
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Figure |. A LIGRA’s eye view of program synthesis in PSI 
3. PREVIEW 


LIBRA chooses from among aooticadie refinements in the 
knowledge base of cocing rules through adattionai sets of ruies 
that can be easily mocified. For examote, ruies adoul pianning, 
Cerived from previous anatyses of How {tc make particuiar 
implementation decisions, reauce the effort at explicitly 
constructing .and comoaring ailernative imoiementations. erated 
decisions are grouped {a reauce (he size of the searen soace ana 
to make cost tradeotfs more onvious. Suies about scneduling ana 
resource ailocstion set priorities that reflect ‘he imcortance of a 
caging cecision and (ne effort excenaed in manirg (he cnorce. 
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When anproorate, slernete moiementations are expiciliv 
copatructed ana comoered anaivincaiiv, "he comparisons use 
SOOM cost eatimates (oO retiect [he niercependerce of vecisions 
he ost estimations can Ce wece at any silage of [he ret nement 
oFocesa, MIMOUgN eat meted Of WOFe comorele erred orograms 
ar@ yeneraiy wore accurate LURA comouies upper sna ower 
rcunds On [he estimated execuiion cost and uses (Nem for 
OFURING DFOgFaM implementations wih Drench and DouNG, These 
vounds are aise useful in identifying certs Of ine orogram !nat 
might ‘ead [a bottlenecks, elinement ‘eeources are then 
concentrated on those perts of Ihe orogram 


Since the snowieage-besed 8 roduer, | ‘aciileies [the 
ACAUINIHON Of Rew orogrammung snowienge. “he same orototypes 
Of Orogrammng cometructs ana of cost estimation oroceoures (Nat 
sumonty (he efficiency aneivee orocess are aso qule useful 9 
adang [ne efhicrency information to malcn Ihe cong snowledge 
‘hal ain the system. A semw-eulomaied orocess ‘or ecging new 
ethiciency informetion hee Deen cevemoed. 


The focus of thie article @ on the overei efficiency frameworn 
and an the *nowiedge-Oesed acoects of (GRA More cetaie on 
Ihe anaivars orocequres and On ONer foprcs Oniy covered orietly 
here can be four in (J) 


4 TWE PROBLEM 


The question sadressed rere 8 ‘ow 10 select an efficient 
moiementalion for 4 “MaEn-evel Drogrem speciticalion, given « set 
Of rules for constructing [ne passioie morementations, | 6 
assumed [hat ihere wey De 2 very ‘erge umoer Of possibie 
implementations and ‘Nat | .@ sot possitwe [0 construct and 
comeere att coesibiilies expicitiy, The deen go wee to 
produce @ tvelem (hel woud sulomercaly select marement ations 
and [that would be competivie wilh ihe refinement pereaigm for 
program aynihece. 


The following evamore ‘lusirates the type of prooiem thei (GRA 
soives. The orobiem is !0 synthesize @ goed impiementaiion of @ 
tIMOIE daladese retrieval program. 


The program first inouls @ aelepese of News stories, it 
then loos, accepling @ Keyword commend and printing 2s 
et of a4 siores in the Gatemese (Net contem (Net 
\eyword, sipmevetized by siory nome. The specie 
keyword “ryezry” couses [he progrem to lermneate. 


As cert of the program speciiication, [he user ney specily 
atormation such ag ‘he estimated cumoer of limes @ seyword 
command will be given, the evvecied number of stories in ihe 
datebese, and the aversge cumoer of Seywords per slory, Some 
variationa of tive example are deveroped further i¢ ater sections. 


41 implementation steve 


Given @ Nqh level orogram descriotion, [Nere ere severe \vypes 
Of imprementation issues lo be conqceredt 


=> cheeeing dete sirveture resresentatione 
== implementing MEh level eper stone 
== epptying optinmamg trenstermetions 


Some of the major aifficuities in reserving [hese \ssues arise ‘rom 
the need (0 consider: 


oo time and epece Irede-ette 

= dependencies smeng sec sene 

== efficrency of largel pregrem vereve 
etfigreney of synineee 


Thus, 9 the orogram described above, a representation for {he 
datevese must be chosen, end @ method for flinging [Ne siories 
associated with the \eyword must ve chosen, { there a [he an 
ovomortumty to soolv & trametormation such e¢ comoiing two 
jooos, | must be determined wreiher that (ransformavan wl 
actualy imorave ihe pertormence of ine target program. 


Often there 9 70 dem representation (Net wmmmres doth soace 
ard time, in the news reirrevel evemore, (ne saleoese con oe 


cCeision For 


A 


t 

representad a8 4 manning [rom GOres (0 Lets Of Kevweree, 
Unvees [he gatacese @ retativery emai, | wu lane qule some 
time to search for ail ihe stores containing ‘he given seyword 
and fo sort thet vat) Another oossiomly § [0 Use an agahonat 
representation of mapoines from seyworos io a sorfea st of 
stores containme@ [hal seyword { seyword seercnes are 
requested ‘requentiy, (6 would morove ine running speed, oul 
al the expense of egailione storage space. 


When more then one dala structure # invoiwed, { may sol be 
DOsIbIe [0 Make implementation deciKONs Noependentiy Gwen 
nos! cos! ‘unchons, (here wit be cross oroguc! lerms nvoiwing 
the spece from one ‘epresentation and [ne lime from an 
Operation on another for evampie, ime could Neopen | ihe cost 
function were (he product of i) eveculion time of a statement, 2) 
sumoper evecutions, and J) (otal siorage (9 use, summed over ai 
statements in the program These cross-oroauct ferme mane | 
impossible [0 anaivre ihe costs of [he cecisions ncepencent!y 
The beat impiementation choice a80 depenas on ihe ‘etative 
frequency of the retrieve: operations ana (he uzes of ine cele 
structures. 


42 Some eudoroviome 


Some sudtasne of [Me genera orobiem of finging an efficient 
implementation inctude cootying [Ne efficency snowwage seeced 
tor 


1) aymmrmeticatly ectimete end compere execution costs 
Orne way 'o choose @ g000 implementation io \o “ene several 
aiternative refinements, estimate ine costs of [he resuiling 
Program implementations, and choose ihe bes! one 


2) store ene eppty previeve efficienty ensivere reeuils 

To avord evcensive anatvers, il 1s Serotul 10 be sie fo expioil Ihe 
resuile of previous ensiyees. So inere should be @ mechenmem 
for adding rutes such a6; 


“in retinng @ set that hae more ihan JO e:ements ang |hel 
‘9 used omy {fo test membersmo and ada ara ceiele 
etoments, [he Nesh-!eme representation 1s « good choce.” 


“In refining @ seauentioiy represented set in wich 
elements ere ‘requentiy inserted ena celeied, use @ inned 
het rather than an array.” (This avenge smifting.) 


J) conconirete ettert on impertent parte of he program 

The synthesis svelem shoud cetermne = whether = ihe 
representation of [he dataoese Nes a qreeier effect on ihe giobel 
program cost then the chore of amnreavetizing fechmque, and 
should Use Inet intormetion to focus syniheae resources. 


43 Related research 


Oniy some of the !ynes of efficiency \noweage described in 'e 
previous section ‘eve been comlied for mactine use. ‘he 
primary research hee been in date-siructure seiection systems. 
Some veritication ana theorem oreving sveteme can orove ‘acts 
about the execution certormence of orogrems, oul [hey vo sot 
use his information 10 guiae progrem syatnens. "he vse of 
etticiency Anowiedge (if orogram tveinete ‘es sol Deen 
addressed Dy CeOUggInG OF analogy sooroecnes. 


The data-ctructure selection systems a! use cost estimation ‘or 
comoerison of molementations, (ow (4) uses wumerca cost 
@alimates [0 choose data structures ‘rom among @ ibrery of 
implementations, To ‘ind Dranching sropebuilies, [he system 
inaerte statement counts into a getauil imolementaiion [hal a run 
On samote cata. Set sizes at afferent comnts in ine program are 
determined by querying the user Morgensiorn's svetem, a cart 
of PROTOSYSTEM=i, (S| uses estimates of fie rout /ouloul and 
sort costs to choose fie system orgemretions ana orcer ‘he 
How of processing Operations in menegement inrormetion sysiom. 


These svateme inciude Seuristice for averding compile seerch, 
but the heuristics ere cot siweye eroressed exonetiy, (ows 
tvetem hae a buitn rule for avonding wulipie “eoresentatons 
by forcing ait date structures to nave ihe same reoresentation 
throughout (he orogram and by conmsirawng a sate structures 
thet are arquments [0 @ cOmmON oOperstion (oO snere en cenica 


: 
; 


representation. Rovner [6] extended Low's work to the seiection 
of associative dala siructures and aiso allowed the seiection of 
redundant representations. Heuristics apout wnen ‘o consider 
redundant representations ang aoout other cost-traceott 
assumptions were carefully noted nm the description of the 
system, but were not expressed as incependent ruies in the 
system imorementation. 


Several ailferent searcn strategies have been tested. Low and 
Rovner use nil ciimoing among the estimated costs of the target 
programs to choose an :moiementation. Morgenstern uses a 
Oyname programming aigorithm specifically taiored to choose 
structures for iarge tiles. Wegoreit [7] gives some examoies of 
the use of performance ansiysis io drive a orogram 
transformation orocess. LiGRA reoresents its resource- 
management strategy in ruies. One of the rues, wren suggests 
consideration of the high potential impact decisions first, is 
similar to the tecnniques used by Wegorent and Morgenstern. 


Severat other acproaches to the orobiem of data structure 
selection have been tanen. The SETL prosect [3] uses a more 
tracitional optimizing comotier approscnm {0 choose set 
representations based om a smaii set of aiternatives. The 
systems described in (9] and [10] attemot to matcn modelling 
structures with tne user's needs. An unsoiveo proviem in this 
aporaacn is haw {0 comaine severat modetiing siructures inta one 
representation. 


5. A FRAMEWORK FOR EFFICIENCY ESTIMATION 


LIGRA was designed to explore the feasidiity of comomning 
anaiytic ana knowiedge-oased approscnes [o efficiency 
estimation. The basic dee in the framework is heuristic search 
through a tree of partially imoiemented orogrem descriptions. 
Efficiency rutes from LIBRA are used to contro! the seerch and to 
add efficiency-enaiysis information to the program cescription 
Coding rules trom Barstow's xnowiedge base are used Io retine 
the program cescriotion into a more concrete cescription. 


The root nece of the search tree is the intial program 
specification and the leaf noces are target /anguege programs. 
Each of the intermecisie neces is a partially imoiemented version 
of the entire program. The order in which refinements are 
considerea affects the subtree that is constructed. The focus of 
attention for retinement may be limited to a particular part of the 
Program, but comparrsons beiween nodes are based on giooai 
execution costs. The tree of partion program imoiementstions, 
each with an agenda of svnihesis {asns, serves as a workspace 
for recording the state of the searcn (see Figure 2 beiow). 


Workspace: 
Tree of pertialiy imolemented 
programe with task agendas 


--- = asta fiow 
———== contro! flow 


Figure 2. Overview of efficiency fremewern. 


A somewhat simplified description of the searcn strategy’ is: pick 
@ program implementation {o work on, pick a refinement {asa 
within that implementation, picx a coaing rule to acneve that 
task, and finatly apoty the coding rule and any associated 
efficiency rules. 


Search-resource-management rules  cnoose a program 
‘mpliementation and then a part of that arogram ‘o work on. 
These ruies assign oriorities to tasks 0 ensure that the tasks are 
carried out within the limts of the resources. 


When refining a part of a program, ai! reievant coaing rules are 
retrieved and tested for aopiicaoilily. P/ausio/e-mplementation 
rules are used {0 help cecide wich coaing rule to apply. These 
rules contain precomouied ansiyses ana are used !o restrict the 
possibie coding rules !o ihose inal seem reasonavie in the g:ven 
program situation, [hus pruning the searcn tree. 


Sometimes several coding rules seem piausibie. Separate 
Program descriotions are set uo ana vefinea, then comparea 
using the cost estimates aeterminea oy cost-aenalysis rules. 
Search-resource-management and piausioie-moiementation rules 
may call on the cost-anaiysis ruies for symoolic execution cost 
estimates to compare different impiementations and identify 
potential bottlenecks in the target program execution. 


5.1 Assigning priorities lo decisions 


Since all implementations cannot be considered in equal deta, 
the quality of the decisions depenas on the orger in wnicn (hey 
are considered and the depih to which the consequences are 
expiored before making a commitment. The searcn-resource- 
management rules use scheduling and resource ailocation to 
balance the final program performance with the cost of choosing 
and constructing (he impiement ations. 


Task-ordering cules determine the ardering far attemoting 
different refinement tasks. Ordering principies inciuce expanaing 
complex programwmng constructs, sucn as “SUBSET” early io 
expose choices, and postponing choices of refinement rues and 
low level coding detais until the mayor decisions have ceen made. 


Chorce-ordering rules lind an order for considering the decisions 
that must be face. One of these rules suggests aiiocating the 
most resources {to the decisions that are ikeity to ‘ead to 
bottlenecks and making those decisions first. Section 5.3 
Gescribes how these high potential imoact decisions are 
identified. LIBRA mekes an adjustment to the potentia: imoact of 
@ decision ta reflect the accuracy of cost estimates ‘or the 
current levei of program deveiooment ana the expected cost of 
completing the refinement process. Without this, a highiy refined 
implementation might be abandoned in favor of a very abstract 
description with a sligntly better optimistic estimate that is 
probeoly not achievavie. 


5.2 Applying pleusible-impiementation vies 


The plausibie-impiementation cules in LIBRA describe the 
situations under which data structure impiementations are 
approoriate, when different sorting operations are otausivie, and 
when to consider using more than one representation for a dala 
structure. This knowledge is used to compare implementations 
without the expense of exolicit construction and evaiuation of 
execution costs of ail alternatives. 


The plausibte-implementastion ruies are structured condition- 
action rules. The condition of a ruie avout data structures, for 
example, states ail the critical uses of a data structure that mare 
the rule relevent. Efficiency information such as the size of a 
data structure and the number of executions of a statement mav 
be used in the ruie condition. The ruie action can set a Goorean 
comornation of constraints for a set of program parts requiring 
that they be refined (or not refined) to a particuiar programming 
construct. A three veiued logre (satistied, moossiore, possiole) is 
used to check constraints. 
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53 Estimating execution costes 


UGRA inciudes a Anowiecge dase of sues ‘or estimating ‘he 
execution cost of a orogram gescription al any stage of ‘he 
retinement orocess and wilh varying ceqrees of accuracy The 
ser $ expected [0 orovicoe some oasic ntormation aooul ihe 
program, and then BRA seeos the anaiysis uogaled for the rest 
of ine refinement process. for exvamore, 9 Ine NEWS orogr 
the oasic information reeved is the expected mumoer of s!0: 
the average numoer of sevwords per sory, ana ine numoer of 
timea the main coo in ihe orogram wil De execuled for a given 
datadase, UGRA ven manes anaiysis transformations no daratiet 
with refinements so inat more accuraie coal estimaies can de 
associated wiih succeeding nodes 1 ‘he iree. Some ansiysis 
rules are associated with particuiar coding !ranstormations. Wanv 
ru'es, such as those for anaiyzing Gooiean combinations, are 
associated with coding constructs raiher inan transtormations, 
information anout parameters such as dala structure sizes, 
statement runmng times and execulion frequencies, and vala 
structure usage information is maintained. 


The !00-aown, incremental ansivsis ailows orograms ‘o de 
anatyzeo thal would be aitficuil 'o anatyze aulomaticaiy if ony 
the ‘argel orogram were oresented. An advantage of comoining 
the steowrse reninement willy Ihis sort of anaivers 9 (hal classes 
Ot imolementations can o@ comnmered Ov considering tne cost 
estimates ‘or intermeciate orogram cescriolions rainer {han 
erotcitliy exoanaing Ihe tree ana comparing (he largel language 
orcerams. 


Estimating eweculion costs 1s not an exact science. LIBRA altacas 
the probiem by using doth upper ana lower bounds on ihe 
evecution cost. The upper bound, or actrevedie estimate, ‘8 
caicuiated dy introducing a standard imofementation for each of 
the orogramming consiruc(s used and by assuming [het stancaerd 
implementation choces are mace for ‘he rest of the retinement 
orocess. The ower sound, of volimwstic cost estimate is based 
on a ‘ower bound for imoiementations snown (a [he program, rot 
a theoretical ower ddund. Giobai optimatic cost estimates are 
estimated Oy assurmng optimetic costs for each of [he constructs 
in the program and by essumwng [hal 10 representation contlicts 
oecur. 


The importance of a decision is weasured by is sefentia impact. 
Thre 1s actievadie bound cost estimate and the execulion cost 
estimated when cotimetic cost estimates are used for aii certs of 
the program invotved in (he cecision 


A general node of program corstructs and specilic modeis for 
each construct are used [0 orgemze (ie cosi estimation process. 
Also, a stamaara cost-computation orocess aiows snaring of 
subroutines Delween estimaiion sirateqies for meaning quick 
estimates ana ‘or performing more cetaied (and usuelly more 
erpensive) anaiyars. 


6. AN EXAMPLE 


This section wil consider the imoiementation of « retrieve 
orogram in more detail. The probiem to be implemented, caied 
NEWS, is: 


Read in a database of news stories. The CATAGASE sa 
maooing from storres !o sets of KEYWOROS. Receatediy 
acceol a keyword and orinis oul a list of [he names of 
the stories in the detedase thet contain that keyword. 
When the special commend “vyzzy* is given insieed of 3 
keyword, then hail. 


LIGRA has directed the imotementation of sever versions of 
NEWS. Unaer afferent assumptions aboul ihe size of the 
dateoase of the cast function [0 ce used, ciiferert 
implementations are setected. Figure 3 below shows [he tree of 
imotementations the enersted and seaercnes uncer certain 
assumotions about data siructure sizes and Orancn probaniities. 
The maror choices [0 be meade in imoiementing NEWS are choowng 
reoresentations for ihe OATAGASE meoping ona for the 
KEYWOROS set. 


A 
| NEWS | ($88, +8000) 
—— es 


\ 


crorce of KEYWORDS set reoresentation 


3 \ C 


_ 
| list of | | mapping 

) SEYWOROS } of XESROROS 

i ee 

(18808, +8000) | (1008, «ad000) 


s 


chatce af QATABASE Napping recresentation 


0 f * | Xs 


——— 

ligt of pairs | | yasn tapie of | | proper ty-| 
<STORY.KEYWORDS>| [STCRY « KEYROROS| [list entry! 
en a oy 


133000, 3S800) (208, 3000) (1208, 3000) 


choice of KEYWORDS sanping representation 


G / *, 4 


nasn_tanie = | | property-iist | 

of KEYWOROS | | entry of KEYWORCS| 
ed 

(1808, 1008) (1208, 1200) 


Figure 3. Overwew of NEWS imoiementation. 
6.1 Alternate implementation paths 


A'numoer of weve !o mpiement NEWS are sossivie with the 
current set of coaing rules. One refinement oath, rove G in the 
seerch tree of Figure 3, 8 ‘otlowed [hrougn an more vetai in the 
following sections. it invoives representing OATAGASE niernaiv 
a8 a Nash tavie of stories, wilh each story in turn Naving @ Nash 
tabie of keywords. The cost function used in (te case 9 the 
product of running lime and sumoer of cages in use. GRA 
chooses 4 hash-labie representation for SEYWORDS vdecouse 
‘here are many sevwords for eacn siory, The time fo convert 
the set of seywords into a Sash [abie & Saiancea ov [he lime 
savings from the memoersmo lest, winch is faster as a Nasn-lavie 
'OOK-up than as a search througi the ist of Keyworns (for arge 
sevword sete) The OATAGASE representation cecition 1s simuar. 
Both choices are remtorcea ov the ‘act inal the nan oop 4 
eveculed wany times vefore euling wih “vvary.” 


Under other assumotions, a paih through nooe 3s tanen and a 
linmned-Hat representation is setected. | the ooo 6 evecuted ony 


. 2 few times or if [he numoer of Keyworas associated wiih a story 


is smail, (nen the time required to convert [he cateoase from the 
tal of care (<atory, Kevwords*! representation (0 a hash-iavie 
representation is not outwengnreo by ‘he ‘ast Naan-lavdie OOK-v0 
Operations. |f space s a critical ‘actor im the cont ‘unction, 
another oath through & 1s tanen ig wien the origina 
representation of a list of pers (8 oreservea, “his avoros Using 
any acationel space, bul at a cost in lime. 


A different tree than the one orcturea in Figure 3 way aiso be 
searchea. Suppose there are oniv a ‘ew sevwords Ser story, 
many stories, and a cost function comnated oy cunning time. 
Then the reoresentation of (Ne OATAGASE wanoire 4 a nore 
critical decision than ihe KEYWORDS set epresentation, because 
the time for the memperstwo test woud sot aitfer qreatiy tor the 
aitterent reoresentations, { ‘ewer resources ar@ avavaoe for 
svnthesia [han in ihe evamoies cescribed avove, |ien some of ‘Se 
less rehadle Slausibie-moiementation rules are used “or 
evemote, nooes F and 4 are Wl consoered when a claundie- 
imolementation cuie ‘Nat orefers Sasn-(aoie ‘epreseniaiions to 
oreperty-“st entries 19 sopred. 


The implementations that LIBRA chooses in ihe case are about 
tre Gest coasioie with [Ne current set of cooing rules. Fvonte 
can gO better on the NEWS exemoie oy using ‘eoresentations 
Outaide ihe scope of [he coming rules. However, for any given 
set of coding rules, ailowing people 10 mene ‘Ne decisions woud 
not oroduce Oeller imolement ations. 


62 initial retinements in NEWS 


The ‘otlowing sections snow more getans of ihe pain esaing fo 
node Go By questiomng the user, UGRA setermines Nai ihe 
excected summer of stores in the catabase 8 50, the average 
sumoer of \eywords per story 8 100, [he expected sumoer of 
terations of [he coe 8 JOO, ana ine oroosouity [tnat ihe 
command is @ Keyword of {he average story 6 Oi. 


LIGRA first calle dm the coming ruses (0 mane refinements [het co 
nat involve any cecisions. For examoie, [he oul OATAGASE 6 
retired (0 [he stangara inoul formal ‘or mapoings, a val of care 
<story, Seyworda>, ana the set of KEYWORDS |6 refines into 
‘inne@d iret LIBRA 8OOHEs Clause ~-moremeni ation rues |0 Necide 
whether ta conader mulicte representations ‘or the SEYWORDS 
set ana the CATAGASE macping. 


During retinement, a “fora” statement enumerating !he comain 
of OATAGASE 6 creates. [| 8 refined Ato an expert 
enumeration of [Ne ilema of camam, nce ony one cocing “We 4 
amoncavie, To vecde Sow [0 reline ihe enumeration, sore 
mtormation about [he reoresentation of the gomain +s “eeved. 
LBRA doen not consicer ai cOstOle Teoresentalion of [he Gomen 
set evpliertivy; the chore 1s mage Ov [he sopicaiion of dieusibie~ 
imorementation cutee. For evemore, (wo of the efficiency rues 
eoout sets ere: 


it the onty uses of a set A are ‘or enumerations over thet 
set, and if B19 another reoresentation tor A thal 1s eatiy 
a la them use (he tome reoreseniatian for A ae 
for 


If ail usee of a set are for enumerations, of as cointert to 
Doations in sel, or as fests of the state of the 
erumerations, and if the fargel ienguage is LISP, [hen 
retine the set into 2 linned list, 


These rules deterrune ihel soman set, wren is used only for 
enumeration end it sot on alternate representation of some other 
set, should be refined info 2 linked list, @ imned-vel should be 
used, Therefore constrants on the dome set sre esladnened, 
ond it 9 refined into « sequence, and [hen into a ist (rather then 
an array) with the chorces Delween sooncebie coming rules 
resotved by the consirants. 


Some af the dete of constructing [he coman list and ihe 
enumeration of the domain are poslponed oy search-resource- 
management rues Secause (GRA oreacts (hal oO decisions wil 
be invoived and ihe cost estimate ‘or that cart of [he orogram 
wilt ROt change siqnificantiv, Cther chores (het arise and cannot 
De resdived Sy Diautitie-motementation rues are 2180 postponed 
unt other uselul refinements are fimenhed. 


6.3 identitying the mest mpertant deneen 


All of the changes above tane piece in node A of Figure 2 
Ouring the refinement, severm choices are cosisened “hese 
chorees are 1) how to retine the CATAGASE saoping used invce 
the ‘or-ail, and 2) how to refine Ine KEYWORDS set withwn that 
mapoing, What is [he effect of each of (he two chores to ce 
mece 'n ine evemete? 


The internat reorecentation of OATABASE, (O81), 6 used ‘or 
retrieving the mee veiue ihevword sets) of stores once oer 
story cer commend. Possible moiementations for “enoings rence 
from 2 linned-iet format inet mane retrieval ineer in ihe cunver 
Of stores to sesccrative siructures (Net Neve “earty constant 
retriever time, 


The kevword sets in C91 (KEYWORCS]) ore used in 0 
“memoertcommend, KEYWOROS)® teat, "hie feet 8 evecuiea ore 
for each story ‘or each leretion of ine Con Possibie 
implementations give wembersiwo feats with times ranging rom 
\in@er in [he mumoer of seywords [0 neariy constant, 


Since the sumer of sevwords = treater (hen ihe surmper of 
storves, ‘Nhe ‘sevworsd ‘epresentaiion sas (Ne argest (ost 
aitterentia: ana « more iaeiy fo De a cotfieneca 9 [Ne ‘ine 
orogram if care 6 sof ‘anen on ihe sep entiahon chore 
Accoraing [0 [he choice-ormering "Wie aQoul aKING WEN DOleRtie 
Moe! GeciHOns rst, ‘Se cert step o fo OOK al ihe DOSEIDIe 
retnements of SEYWORDS | 


Cecison-maning resources er@ assigned. Currentiy ‘he "esources 
measured are (he CPU time used in carrying oul (Ne refinements 
and the numper of soges used oo ine refinement (rees, he 
resources “eeced [0 compiele a orogram morementation wihoul 
mManirg choices are estimaied ana sudlrecied ‘rom ‘se ‘ola 
avaiacie resources,  Oecision-maning -esources ‘rom ihe 
remanger ere assigned (n oroportion ‘a 'he estimated moortence 
ot (he decision, Then, sepersie orogram gescriotions are sei uo 
(actually ‘Nev there some suosiructure) 9 wmcn each of ihe 
ailermate coding ‘wes are aopved. 9 [Me secon, [Ne s000c ao1e 
rutes atiow either retimeg Ine Seyword cet tO am exonecrl sel, 
caging to search node Gof inio an expnc:l wepoIng, esorng [0 
yearcn node C. 


6.4 Cuptering two impiomentations ‘or KCYWORCS | 


VIBRA's goat is lo retine he sitermatives (8 ana C) enough so [hai 
the COmMMarIBON among moementalons ani Oe formative "he 
resources Crevicusty assigned give voDer wis on ihe ime ara 
space 10 Oe sDENt OM gelling @ MOre accurate estimate oF ‘he 
orogram cost of [he imorementalion cenrg exprored = Lach 
Orogram gescriotion ain0 ee @ “purcese® fo Oe ‘ulead, wien 
serves ae @ teal of whether ihe [enn ‘ee Deen achieved end 4 
uted [0 set some of 'Ne [a8n ana choce-oroerng siralegies. 
There © 80 8 set of progrem parts (hel 6 10 De ‘Ne ‘Ccus of 
altention of orocesang 9 ihe case, ‘he SEYWORDS ovis 
structure and ihe representation conversion ord ihe memoerimp 
teal ere inctuded in ine focus set, 


in the firet orogram cescriotion, search sooe 3 ihe enoucit-sel 
rule (8 aD0Ned ana refinement Sroce@ecs unt! a ‘erevent ‘eens 
are satietied -- the resources siowea ‘or writing he orogrem 
are gererous if (his evemore. Al ine conciusion, Ne seyword set 
for each slory hee deen refined, atier ihe eponcation of severe 
coging rules, into a LISP lel, ang [he memoersmo opereiion Nes 
been refined into ea el seercn 


Refinement of search code C. [he orogram vescription 9 wien 
the evotel-meooing ‘ule wet sooved, aiso Neils Seceuse aH 
retevent tasks Neve been eccomoiisnhed ‘ere ihe sevword sel 6 
retined to # "eODINE and memoersmo lesied Ov seeng | ‘Nere 4 
mapoing for the given sey The 4 480 8 representation 
conversion since Ihe seyword set 6 reoresenied a6 a ial in ine 
noul. 


L BRA then comoutes sptimmstic and ac™evadie Sounas of ‘Ne 
cost Of [he whole orogram for each orogram cescriotion in [he 
inned-iet molementation, 3 ine cotimwete estimaie a 1 S000 
milieecona-ceges, ana the actreveoie pound 8 48000. The 
Oolimmetic and actrevenie cost e@siimetes ‘Or [he weonirg 
representation, C. are (C00 ema 40000 resgectivery Srencn ana 
Dound 8 sapoled 10 elmnaie any Morement sions wih COLmEtIC 
@utimates worse han the scheveme estimate of some oiher 
moiemeniation Neiiher moremeniaiion 6 e.mrsion 9 (8 < ate, 
thougn later in ihe retinement of VEWS ihe lecrmaue Hi oe 
trunttut, Node C ee Ihe Dest ocotimetic estimate ona 4 chosen 
for further retirement 


6.5 Retirmng the reel of NEWS 


The reme FOCIHIONS Ore CHOOTING 2 TeNmement for ‘he eXOHCH! 
maoorng of SEYWORDS | ano crooning « cennement (or OF) he 
dalapese cecision a chosen Ov ‘he oolentia moe! selhoa 
Three orogrem sescriptione are set uo io contier [Ne iNree 
aoonceable ‘etinement ruies -- one ‘oO consiver ‘ering ihe 
mapoing to a jet of care (seercn soce OD), one 'G conmaer 4 
stored meoorrg ‘noce £), end ore (0 concer @ aselnduied 
manoing inode *\ “he ‘eievent darts af ine orogram. iNose 
rotated to the OBL decision, are [hen vetnea oo eacn OFogram 
detcriotion. For evamore, [he slored manning 4 Sefined [0 2 \a8n 
teote. The resulting program cescripiions ere ‘nen comoerea 


with @ach otner ana with ether program aescriptions thal nave 
peen lemooreriiy sefencones, such as ine search  0Ce &@ As 
Faure 3 snows, nooes 8 ano 0 can ce etmmnaiod from further 
consiceration Deceuse ever ner ower Douncs ere worse ihen 
acmevaote Counc on node E The most promising imorementation, 
searen node €, 8 nen crosen eng refinement continues. 


The fina gecson {9 be mece s Tow [fo represent ine 
<EYWORDS | sel, wien nes been refined into a mecping As in 
the rermement of nowe C, inere are (nree sopucabie Coming rues. 
“owever, (here @ om sponcabie dDiausidDie~morementation rule 
eOOUl PaDOINgS (Nal einmnaies OMe Of [Ne DOSsDIIlIes. 


{ a mapoing Nes sireedy Deen retined from e sel, [hen co 
not retine if into a sel Of pews. 


Thus, only two coding rules are consicered. These ruies ere both 
testea, in search noges G and H "he stored manping, eacing to 
the nash |a0le representation in nove G proves to o@ ihe vest 
crorce. At the pont, the cost estimate «6 precise enough to 
eivmmate all ine ather possiDiilies. Thus, [he bes! possiOnity is 
the mplemeniation of Doth ine sevword sel ena tne mepoing 081 
as Nasn tavies As refinement coniinues, severa: olner cnoices of 
coding rives are presenied, oul Ihey ere aii resoived bv 
DlaUsOle—morementalion rues The occisions meade incivoe 
choosing 'o recomoute ratner than siore vaiwes 'nal are easy fo 
comoute The program aescripivon i tinaiy retinea into a USP 
oroRram. 


7 KNOWLEDGE ACQUISITION AIDS 


LIBRA inciuces mecharsme fo assist in the acquisition of new 
Programming consiructs, incivang [he sadilions inal are mece [0 
efhoency snowlegge when new cooing ‘nowiegge ‘8 soded. 
When new Mqn-iever constructs ere edded, uch ae ew types of 
sorts, or trees, new efficrency Anowiedge 8 neeced [0 eneivre 
trese consiructs, 'hew sudoerts, running times, ena other 
ellereney properties. LIGRA'’s prototvoes of progremming 
constructs are conmsuited Dy ecquisilion-erd roulines wren new 
constructs ere saged. Some 0! ine necessary intormalion can be 
Cecuced eviometicariy, end ine user 6 sened specitic questions to 
Ovtain ine rest. 


Estimates of runmng time end space usage depena on ihe torget 
lenquage ena target comouier (GRA provides « sem-eulomatic 
oroceaure for ceriving cos! estimation functions from the sel of 
functions for the target ianguege consiructs. This orocecure cen 
be used in !0 update efficiency ruies when new cocing rules are 
adoec Currentiy ony limes estimating functions are cerived, dul 
2 simuiar process could be used fo check Ihe accuracy of [ne 
Olaus bIe-motementation rues in ine system wren new coaing 
anowiecge 's edded. 


& CONCLUSIONS ANO FUTURE DIRECTIONS 


The use Of efficiency estimation in program synthesis 1s 2 new 
bul promene tea The issue of oale-siructure seiection 
has been sivaed in some detan, bul nol [ne issue Of estimating 
the effects of aepiving mgn level program trenstormetions. 
UIGRA prevides @ framework in when DOIN gate-siructure and 
aigorithe setection can be treated The heuristics thet suggest 
Croerines for consioering retinement {asks end decisions 
and {nat «sugges! «© biausidie implementations and = wren 
‘0 consiaer muitioie morementalions ere expressed erpicitly 
as tutes, A start nes been mace On symone = agorithm 
anaivers, and increments enetvers 6 used [0 mane [he ansivers 
crocess trectabie, One of the goals in LIGRA 1s to Drean uo 
Ine orogrammrg process inio maenegesdie chunas in oroer to 
‘corn more avout ihe sequences cf implementation 
cnorces eveiaore, how the choces interaci, end when eng how 
Ine chores ShOUId be meade. 


To extend LIBRA to comotete automatic orogramming system, 
additions research would be neeced. For suamole, [0 wrile 
more comoier orograms such a6 comorere of oDersiing 
svsioms, more coging and efficiency rues eooul constructs such 
as Orl-pacXing, mactine interruols, ano muilioroces: would 
neeq to be ascced to ihe system. sOwever, ine efficiency 
tecnmaues described hnere snovid be sufficient to contro! 
COMEINAtOrIa eXDIOTION 


aner level optimizations, extended symooic anasiysis eno 
COMDAFIBON Capediilies, and more Goma evoerice are some 
teesiole extensions fo LIBRA = Anginer possibilty is to 
automate [Ne checking Of conditions in ine neuristic rules Dy 
Going @ COmpiele search through ine current set of coaing rules. 
Automatic generation of heurisiics Cased On anaivsis of symoouc 
cost estimates would oe another imporient acoilion Aoang 
an ointerence process (0 botn the cooing ana efficiency 
eslimation process woud aso be uselui, (nougn not as 
siragnitorwera 


More powerfui Symbolic comparison lecnmaves are aso 
possiote. For exemoie, the range of vawes for wnicn one 


By 
implementation comnales another icisN* over c2*N) could be 
determines. The user would then oniy nave io say wrether N 
was within a particulier range, rather inen givine « cetinile 
veue. Anoiher use of symnonc cosis is in propos! 
siternate sotutions, each with the corcitions [nat mane the 
solution the best cnorce. if, for exampie, the cost for primilive 
Operations such a6 muilipivy are given as ranges, [the system 
covla = proguce =the = solution “mpiementation X is best i [he 
target machine nes a very fast muitipiv, oul impiementation 
Y 1s best if muitioucation tenes aooul ine same time as 
adaition.” 


LIBRA nes cemonstrates the feasioility of the aporoecn 
Gescribed here, bul Nas by no means exnausied Ine research 
loores in efficrency estimation for program syninesis. 


REFERENCES 


{1} Barstow, 0. R Knowiecge-oaced Program Construction 
Eleevrer North-Holland, New Yorn, i979. 


{2] Green, C. C “The Design of the PSI Program Synthesis 
Svstem.” in Proceeorngs of ‘he Second internationa 
Conterence on Software Engineering, Computer Society, 
institute of Electrical and Eiectromes Engineers, inc, Long 
Beach, Caiitorma, Octoder 1976, 4-18. 


{3} Kant, €. Efficrency Consderations in Program Synthesis: A 
Knowiecge-besed Approach. Fortncomng °n.0. thesis, 
Stantord Unversity, 1979. 


(4} Low, J R. Automatic Coding: Choree of Date Structures. ISR 16, 
Giranaeuser Veriog, Baset, Switzeriana, i976. 


(5) Morgenstern, M automates Oesien anc Oolimzation of 
Management Information System Software. MT Laboratory 
tor Computer Scrence, Ph.0. thesis, Septemoer 1976. 


{6} Rovmer, P 0. Automatic Reoresentation Seiection for 
Associative Data Structures. PhD. thesis, Compuler Science 
Department TRIO, The University of Rochester, Rochester, 
New York, Sepiemoer 1976 


(7) Werore:t, & “Goai-Oirectes Program Transtormation.” in 
Thro ACM Svymoosum on Princes of Programmng 
Languages, Jenuary (976. 


(8} Schwartz, 1 T. “Optimzetion of Very Mgn Lever Languages.” 
in Computer Lenguees, Vol. i, Permegon ress, Nortnern 
weland, 1975, 161-194 


[9] Rosenschein, S. and Katz, S. “Selection of Reoresentations for 
Data Structures.” In Sroceecings of ihe Symposium on 
rr aia Intethgence ana Orogrammng Lergveges, Augus!, 


{10} Rowe, ... and Tonge, F. M “Automating the Selection of 
Impiementation Structures”. JEEE Transactions on So‘twere 
Engineering, Voi. SE-a, 6, Novemoer 1978. 


